Low-degree tests at large distances 



. . . Alex Samorodnitsky* 

o. 

O . February 2, 2008 

u . 
Ph. 

< 

\^ ' Abstract 



o 



C^ 



We define tests of boolean functions which distinguish between Hnear (or quadratic) 
polynomials, and functions which are very far, in an appropriate sense, from these poly- 
nomials. The tests have optimal or nearly optimal trade-offs between soundness and the 
S,*/ ' number of queries. 

(~| . In particular, we show that functions with small Gowcrs uniformity norms behave "ran- 

domly" with respect to hypergraph linearity tests. 

A central step in our analysis of quadraticity tests is the proof of an inverse theorem for 
the third Gowers uniformity norm of boolean functions. 

The last result has also a coding theory application. It is possible to estimate efficiently 
the distance from the second-order Reed-MuUer code on inputs lying far beyond its list- 
^ . decoding radius. 
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1 Introduction 

This paper returns to the general question of the relation between number of queries and the 
probability of error in low-degree tests. 

The specific questions we deal with originate within a wider framework of Probabilistically 
Checkable Proofs (PCPs). The PCP theorem 0|3| states that it is possible to encode certificates 
of satisfiability for SAT instances in such a way that a probabilistic verifier using logarithmic 
number of random bits can check the validity of the certificate with high probability of success, 
after looking only at a constant number of bits in the encoding. We consider here only PCPs 
with almost perfect completeness, which means that valid certificates are nearly always ^ ac- 
cepted. Given this, and fixing the number of queries q, we are interested in the best possible 
soundness of the PCP, namely the probability s of accepting an encoding of a false proof. 

It is easy to see that, unless P = NP, the lower bound s > 1/2*? must hold. Stronger 
lower bounds were given in |15| I26j . The best known lower bound 171 is s > O (^). From 
the other direction, the PCP theorem shows that we can achieve s < q^^j , and it was shown 

in |H], following P^, that s < ^2T~- I^ EU) assuming the Unique Games Conjecture ^3> the 
upper bound was improved to s < q/2'^~^, which is of course (conditionally) best possible, up 
to constants. 

Let us say a few words on the structure of a PCP protocol. In the common paradigm [^ the 
verifier of a PCP is split into two entities, the inner and the outer verifiers. Roughly speaking, 
the outer verifier chooses the (randomized) portion of the proof to be checked by the inner 
verifier. The inner verifier views the binary string it is given as a boolean function, and looks 
for a certain combinatorial pattern. If the pattern is not there the proof is rejected. If the inner 
verifier finds the appropriate property with non-negligible probability over its inputs, the outer 
verifier can then use this information to validate the PCP statement. 

Due to the gap structure inherent in the PCP construction, the decision of the inner verifier 
is usually dichotomic. This is to say it must accept if the property is satisfied and reject only 
if the function is very far, in the appropriate sense, from having the property. 

In this framework, an often considered property of a boolean function is that of being 
represented by a low-degree polynomial over a finite field. Here we deal only with the field of 
two elements and this representation is particularly simple: 

Definition 1.1: A boolean function / : {0,1}" -^ {—1,1} has a degree-d representation if 
f{x) = (— 1)-^(^', where P{x) = P {xi...Xn) is an n-variate polynomial of degree d over F2. I 

In our version of the Low-Degree testing problem we are given an oracle access to a boolean 
function / : {0, 1}"" — > {—1, 1} and we want to determine whether 

1. The function / can be represented by a degree-d polynomial 

2. It is 2 — e far from any function with such representation. 



^See, e.g., |27| for a precise definition 



The distance between two functions is a fraction of points in which they disagree. 

Low-degree tests we consider have perfect completeness, namely in case (1) they always 
accept. We now define the soundness of a test. 

Definition 1.2: A low-degree test has soundness s if for any function / that is 2 — e far from 
degree-d polynomials, the test accepts / with probability at most s + 0(e) where (j){x) ^^^^o 0. 
I 

Designing a low-degree test with a good trade-off between the number of queries and the 
soundness is a step towards a PCP construction. There are several ways in which such a result 
needs to be augmented to lead to a full PCP construction. We refer to the discussion in fS^. 
It seems, however, that in most cases in which this extension process succeeded, the obtained 
PCP inherited the relevant parameters (number of queries, soundness) of the low-degree test. 

Degree-1 (linear) tests with asymptotically optimal asymptotic trade-off between the number 
of queries and the soundness where given in |23j . In the same paper these tests were extended 
to PCP constructions with similar parameters. 

A natural way to improve the PCP parameters further is to consider additional combinato- 
rial tests. 

In this paper we study degree-2 tests and relaxed versions of the degree-1 test. Here is a 
brief overview of our main results. 

• We define and analyze a degree-1 test with relaxed rejection criteria whose trade-off 
between the number of queries and the soundness is asymptotically optimal and is much 
better than that achievable by the standard linearity tests. A different (and easier) 
analysis of this test was given in j21]. In that paper we were also able to extend the test 
to a conditional PCP construction (assuming the Unique Games Conjecture JSl) with an 
optimal number of queries vs. soundness trade-off. 

• We define and analyze a degree-2 test with a very good trade-off between the number of 
queries q and the soundness s. (We conjecture this trade-off to be asymptotically optimal.) 
A technical ingredient of this result has a natural interpretation in the framework of error- 
correcting codes. We give a tight analysis of the acceptance probability of a natural local 
test of ^ for the second-order Reed-Muller code at distances near the covering radius of 
this code. As a consequence, it turns out to be possible to estimate efficiently the distance 
from this code on inputs lying far beyond its list-decoding radius. 

Our analysis of these tests is based on several technical assertions which could be of independent 
interest, and which we describe next. 

• We give a tight analysis of the Abelian Homomorphism testing problem for some families 
of groups, including powers of Zp. The central technical claim, which we state here for the 
special case of p = 2, is that if a function : Zg ^ Zg satisfies Pr {(j){x + y) = 4>{x) + (piy)) 
with probability bounded away from zero, then there is a matrix D S M„^„ (Z2) such that 
a linear transformation ^ : 2; 1— > Dx coincides with on a non-negligible fraction of the 
inputs. 



• We introduce and study the notion of a generalized average of a function / over {0, 1}". 
A generalized average is a non-linear functional on the space of real (or complex) valued 
functions on the boolean cube. It is associated with a binary matrix M and it measures 
the average over a certain family of subsets of {0, 1}", defined by M, of products of / over 
each subset. Generalized averages arize naturally in the analysis of low-degree tests. An 
important special case is when this family consists of all the afiine subsets of {0, 1}" of a 
fixed dimension d. The generalized average in this case turns out to measure (a power of) 
a norm of the function /. These norms are the Gowers uniformity norms ^J and they 
measure, in a certain sense, a proximity of the function to a polynomial of degree d. 

— We show that a function with a large third uniformity norm is somewhat close to an 
n-variate quadratic polynomial over F2. Similar results for finite Abelian groups of 
cardinality indivisible by 6 have been independently proved in [141;. 

— We show that functions on which the hypergraph linearity tests defined in |52I fail 
with non-negligible probability have large uniformity norms. 

— We observe that functions with small uniformity norms are pseudorandom in the 
sense of |11| . and briefly discuss pseudorandom properties of such functions in our 
context. 

In the next sections we give a more detailed description of the background and of the results 
in this paper. The proofs are given in the Appendices. 

Organization 

We describe relaxed linearity tests in Section |2 Degree-2 tests and properties of the Reed- 
Muller code of order 2 are described in SectionlHI Abelian homomorphism testing is discussed in 
Sectionl^ Section|Slgives more details on the technical tools used, in particular their connections 
with recent work in additive number theory. A notion of pseudorandomness of boolean functions 
which comes from additive number theory is introduced and briefly discussed. 

2 Degree- 1 Tests 

This is the simplest and the most useful case in practice. A boolean function / has a degree-1 
representation if f{x) = (— 1)('*'^')+'', where a is a fixed vector in {0, 1}", and 6 is a fixed constant 
in {0, 1}. Hence in this case the tester has to decide whether the function is linear ^ or is far 
from every linear function. 

A simple linearity test with three queries was defined in |H]. ^ 



Choose uniformly at random x,y £ {0, 1}" 
lff{x)f{y)f{x + y) = l 
then accept 
else reject 



^or rather affine. In practice the function is usually tested for linearity (& = 0). The two testing problems are 
essentially equivalent, and we occasionally will, with some abuse of meaning, refer to both as linearity testing 
problems. 

^We observe that to transform this test to an affinity (degree-1) test, it suffices to replace 1 with /(O) in the 
definition of the test. 



It is shown in Uj that if this test accepts / with probabihty 2 + '^ then / is 2 — 25 close to 
a hnear function. Therefore, according to our definition, this test has soundness s = ^• 

Independent repetition of g/3 basic tests leads to a test with q queries and soundness 
s = 2q/-i ■ To improve the trade-off between q and s, more complex tests have to be considered. 
It turns out that it is possible to associate such a test with any given graph. Fix a graph 
G = (y, E) on t vertices. The following test is a dependent combination of the basic tests of [H]. 



Choose uniformly at 


random 


a;i,...,xtG{0,l}" 




Ifn.6e/(^.)-/(E.ee^.) = 


/(O) for 


all e E ^ 




then accept 




else reject 





These graph tests were defined in |2Zj. A graph test associated with a graph G = iV^ E) runs 
\E\ correlated copies of the basic linearity test. In [^ it was shown that for functions which 
are far from degree-1 polynomials (this is to say, have small Fourier coefficients), these copies 
of the basic test behave essentially independently. More precisely, the soundness of this test is 
s = I/2I I. Of course, the total number of queries is g = \V\ + \E\. In particular, choosing G 

(2) +t and s = -jr-. 



to be the complete graph on t vertices, we obtain an affinity test with q = (n) + i and s — 



This means that s ~ -^q-- 

A natural generalization of graph tests to hypergraph tests was given in ^2^^ . Let H = (V, E) 
be a hypergraph on t vertices and consider the following test: 



Choose uniformly at random 
2;i,...,XiG{0,l}" 

for all e G -E 
then accept 
else reject 



A hypergraph test runs \E\ copies of the basic linearity test, where \E\ is now the number 
of hyper-edges. Unfortunately, it is not true that, for functions far from degree-1 polynomials, 
these copies behave independently. Consider a function /(x) = (^—\Y'^^'^^---~^^'^-'^^'^. This func- 
tion is maximally far from all degree-1 polynomials (it is a bent function), but any hypergraph 

test with g = ll^l -|- |£'| queries accepts this function with probability at least ^-g, — [23] • More 
generally, we show in 24 that this is true for any non-adaptive linearity test that always accepts 
linear functions. 

Our results 

The starting point of this work was the realization that the function / we have described 
is a quadratic polynomial and that it is accepted by a hypergraph test with non-negligible 
probability, because, roughly speaking, the basic ingredient of this test takes a discrete derivative 
of the tested function and compares it to zero. The order of the derivative is essentially given 



by the cardinality of the hyperedges. We will say more about this in Section El and in the 
full version of the paper. The natural question then is whether quadratic polynomials, and, 
more generally, low-degree polynomials, are the only obstructions to better performance by 
hypergraph linearity tests. 

We give a partial (affirmative) answer to this question for general hypergraphs. We are able 
to answer this question completely for hypergraphs of maximal edge-size 3 and for quadratic 
polynomials. The answer is again positive. We conjecture the answer to be positive for general 
hypergraphs and low-degree polynomials. 

We prove two claims. These are the main technical results of this paper. 

The first claim is valid for any hypergraph. First, we define Gowers uniformity norms. 

Definition 2.1: Let / : {0, 1}" ^ M be a function, and d > 1 be an integer. The d-th Gowers 
uniformity norm (for the group Z2) is given by 

1/2'' 

JEx,yi,...,j„ n f[x + J2yi 

SC[d] \ iGS / 

Here x,?/i, ...,yd are chosen uniformly and independently at random from {0, 1}". I 

Theorem 2.2: Let H = iV^E) he a hypergraph with maximal edge-size d. Then the probability 
that the linearity test associated with H accepts a boolean function f is bounded by 

1 

Another (and easier) proof of this theorem and its generalization to several functions is given 
in IS 



The second claim is that a boolean function with a large third unformity norm is somewhat 
close to a quadratic polynomial. 

Theorem 2.3: Let f : {0, 1}" -^ { — 1; 1} be a function such that ||/||;73 > e. Then there exists 
a quadratic polynomial g such that the distance between f and g is at most 2 ~ (-' ■ Here one 
can choose e' >Q. (exp{— (-^)}) for an absolute constant C. 

Consider the following relaxed degree-1 testing problem. Given an oracle access to a boolean 
function / : {0, 1}" -^ {—1, 1} and an integer d > 2 we want to determine whether 

1. The function / can be represented by a degree-1 polynomial. 

2. \\f\\u,<e. 

Once again we want tests with perfect completeness. The soundness of the test is defined as in 
Definition O 



Remark 2.4: Let us point out, that this test is indeed a relaxation of the standard' degree-1 
test. It is known |llj that uniformity norms ||/||c/^ of / are monotone increasing in d. It is easy 
to see that the second uniformity norm is the same as the I4, norm of the Fourier transform of 



1 



/• ll/llc/2 — (Z^aeioij" ji'^)) — ™^^ae{o,i}" I/(q^)I- This means that the functions the test 
has to reject are at least 2 ~ f ^^^ from degree-1 polynomials. I 

It is a direct consequence of Theorem l2 . 2l that hypergraph tests solve the relaxed testing problem 
with the "right" soundness. 

Theorem 2.5: Let d > 2 and let H = (V, E) he a hypergraph with maximal edge-size d. Then 
the hypergraph linearity test associated with H solves the relaxed degree-1 testing problem with 
perfect completeness and soundness 1/2' L 

Choosing f/^ to be a complete d-uniform hypergraph on t ~ q^' vertices leads to a test with 
q queries and soundness s < — g? — • This trade-off is shown to be asymptotically optimal in 



It remains to observe that Theorem 12 . 51 together with Theorem 12.31 implv that the complete 
3-uniform hypergraph test distinguishes between linear functions and functions which are far 

from quadratic polynomials with optimal soundness of s < — 59 — • 

3 Second-Order Reed-Muller Codes 

A binary error-correcting code fW of length N and normalized distance 6 is a subset of {0, 1}^ 
in which any two distinct elements disagree on at least (5-fraction of the domain (the coordi- 
nates). This allows for error-correction: a corrupted codeword (element of the code) with less 
than 5/2-fraction of the errors can, in principle, be recovered by going to the unique nearest 
element of the code. We call 6/2 the unique-decoding radius of the code. 

Finding the nearest codeword can be computationally hard. Here we are interested in 
efficient error-correction. 

An important example of a code of length N = 2" is the subset of {0, 1}^ whose elements 
are evaluations of n-variate degree d polynomials over F2. This is the Reed-Muller (RM) code 
of order d. Efficient error-correcting algorithms for RM codes were given in j21j . 

One can go beyond unique decoding. It is an easy consequence of the Johnson bound 
for constant-weight codes ^H] that there is A > 5/2 such that there could be only a few 
(polynomially many in A^) codewords at distance A from a corrupted codeword. We call maximal 
A with this property the list-decoding radius of the code. For many codes there are efficient 
list-decoding algorithms [^S] that, for any A smaller than list-decoding radius, recover all the 
codewords within distance A from the corrupted codeword. To the best of our knowledge there 
are no such algorithms for binary RM codes of order larger than 1. 

Another useful property of a code is local testability ^Hl- A code is locally testable if there 
exists an efficient randomized algorithm (test) which, given an access to a putative codeword 



/ G {0, l}'^, examines a finite number of coordinates of / and decides whether / is a codeword. 
We want the test always to accept vahd codewords, and to minimize the probability s of 
accepting an invalid codeword, given the number of queries q. The questions we discuss in this 
paper fall naturally into the framework of local testability of Reed-Muller codes. In fact, we 
deal with a special case (a promise problem) in which the putative codeword is promised either 
to lie in the code or to be (2 — e) far from the code. We remark that, in the general case, the 
probability the test accepts an invalid codeword will necessarily depend also on its distance 
from the code. 

Example 3.1: A good example for the notions we have discussed is the first order Reed-Muller 
code, also known as the Hadamard code. The distance of this code is 1/2, and therefore its 
unique-decoding radius is 1/4. However, it is efficiently list-decodable for any distance A < 1/2 

M- 

The Hadamard code is also locally testable. In fact, the basic linearity test of (Bj is a good 
3-query local test for this code. [1] studies the dependence of the probability this test accepts 
an invalid codeword on its distance from the code. For distances close to 1/2 the analysis it 
tight, and the probability of acceptance is shown to be upper bounded by 1 minus the distance. 
I 

Local testability of Reed-Muller codes of any fixed order d was proved in ^. The basic test 

in PP (presented here with a small twist to adopt it to our setting) chooses independently at 

random d + 2 vectors x, yi, ..., y^+i in {0, 1}"", and computes the product of the tested function 

over the d-dimensional affine subspace of {0, 1}" given by x -|- Span (yi, ..., y^+i). If the product 

is 1 the test accepts. Otherwise it rejects. This is a natural generalization of the linearity test 

of [2]. While that test can be interpeted as taking a random second directional derivative and 

checking whether it vanishes, the test of ^ amounts to checking whether a random derivative 

of order d + 1 vanishes. 1 studies the dependence of the probability this test accepts an 

invalid codeword / on its distance from the code. (We observe that this probability is precisely 

i+ll/lP'' ^_^ 

2 — ^) cf. Definition 12. 1|) . In particular it is shown that, for distances larger than 2~'^, the 

probability of acceptance is upper bounded by 1 — 17 (^^^2^^^). Thus, for d = 2, the probability 

of acceptance is upper bounded by some constant smaller than 1. 

Our results 

We study the probability of error of the test of ^ for the second-order Reed-Muller code 
and for distances close to 1/2. We provide a tight analysis for this case, showing this probability 
to be essentially upper bounded by 1 minus the distance. Specifically, by Theorem 12.31 if this 
probability is larger that 1/2 -|- e then there is a quadratic polynomial whose distance from the 
tested function is at most 1/2 — e'. 

Our result has a following coding interpretation. Although the list-decoding raduis of the 
second-order Reed-Muller code is 1/4 ^^, it is possible to determine whether the distance of 
a given function / G {0, 1} from the code is strictly smaller than the covering radius of the 
code, which is 1/2 — o(l). ^ More precisely, we have the following proposition. 

^This is also, with overwhelming probability, the typical distance of an element of {0, l}'^ from the code. 



Proposition 3.2: There is a positive constant C such that, given a function f : {0,1}" — > 
{ — 1, 1}, and a parameter 5 > 0, it is possible to determine, with probability arbitrarily close to 
1, and in time linear in j, which of the two following (mutually non-exclusive) options holds: 

• The distance of f from the quadratic polynomials is at least 2 ~ ^ {^^ ) ■ 

• The distance of f from the quadratic polynomials is at most ^ — exp < (—5) \ ■ 

Combining theorems 12.51 and 12.31 leads to our main result in this section, an analysis of hyper- 
graph degree-2 (quadraticity) tests. 

Given a 3-uniform hypergraph H = {[t], E) on t vertices, the test is defined as follows. 



Choose uniformly at random xi, . . . ^Xt G {0, 1}" 
If for all e = {i, j, k} ^ E holds 

^x,,Xj,Xkf {Xi)f {Xj)f {Xk) f {Xi + Xj) f {Xi + Xk) f {Xj + Xk) f {Xi + Xj + Xk) = /(O) 

then accept 
else reject 



Theorem 3.3: Let H = iy,E) be a 3-uniform hypergraph. Then the hypergraph quadraticity 
test solves the degree-2 testing problem with perfect completeness and soundness l/2l^'. 

Choosing ff to be a complete 3-uniform hypergraph on t ~ q^'"^ vertices leads to a test with q 

2n(,2/3) 
queries and soundness s < — g? — ■ 

Discussion 

Analyzing acceptance probability of a low-degree test at distances larger than the unique- 
decoding radius seems to require a different set of techniques. It general, to prove that a code is 
locally testable, one needs to upper bound acceptance probability by a function of the distance. 
This is achieved by showing that if acceptance probability of the test on an element / G {0, 1}^ 
is higher than a certain threshold, there is a codeword g not far from /. In most cases the test 
itself is used to efficiently "decode" /, viewed as a corrupted codeword, to the unique nearest 
codeword g. This approach is harder to implement when there are several possible codewords 
to choose from, and symmetry breaking in required. The only example we are aware of is the 
Hadamard code. In this case one is assisted by the fact that the elements of the code are 
pairwise orthogonal (as vectors over the reals). In particular, for any e > 0, there could be only 
a constant number of codewords at distance smaller than 1/2 — e from /. This no longer holds 
for degree-2 polynomials. For instance, the list-decoding radius here is 1/4. Our main tools in 
this case are harmonic analysis and additive number theory. In fact, a significant part of our 
proof follows the approach of Cowers |llj in his proof of Szemeredi's theorem for arithmetic 
progressions of length 4. 



4 Abelian homomorphism testing 

Let G and H be two finite Abelian groups. In the Abelian Homomorphism testing problem we 
are given an oracle access to a transformation cp : G ^ H and we have to decide whether 
is a homomorphism or is at least S-far from any homomorphism between G and H. This 
problem is a generalization of the linearity testing problem, in which case G = H = Z2- It was 
first studied in |SJ, where the following natural generalization of the basic linearity test was 
suggested: choose x,y £ G at random and check whether (p{x + y) = (j){x) + 0(y). The analysis 
of this test leads to the following question. 

Let (j) : G ^> H such that the group law for (j) holds with positive probability. 

Prx,yeG (Hx) + Hy) = 4>{^ + y)) > e- 

Let p be the maximal e' such that there exists a homomorphism ip from G to H such that 
Pfx {^(x) = "ipix)) > e' . The question is whether p can be lower bounded in terms of a function 
of e that is independent of |G|. In this is shown to be true if e > 7/9. This lower bound on 
e is also necessary 0. 

If both G and H are powers of Z2, the lower bound on e was relaxed to e > 83/128 [^. 

Our results 



We show the following theorem to be a simple consequence of two results |1H I22j in additive 
number theory. 

Theorem 4.1: Let p be a prime number, and let e > 0. Let G be a p-group of order r and let 
H be a power ofLp. Let cp : G ^ H such that 

Pr^,yeG (Hx) + Hy) = Hx + y)) > e- 
Then there exists a homomorphism ip : G ^ H such that 

Pr^(.G {Hx) = i^{x)} > c ■ r"'"' • e''", 
where c,c',c" are absolute constants (independent of the groups G,H). 

In particular, if both G and H are powers of Z2, p can be lower bounded by a function of e, for 
any e > 0. In testing terms, this means that the acceptance probability of the basic test of ^ 
goes to zero as the distance from the code (the set of all homomorphisms) goes to one. 



5 Tools 

In this section we discuss the technical tools used in this paper. We believe these tools, and 
their connection to recent results in additive number theory, might be of independent interest. 



5.1 Generalized averages 

Let H = (y, E) be a hypergraph on t vertices. Given a boolean function / : {0, 1}" — > 
{ — 1, 1}, the acceptance probability of the linearity test associated with H on f is easily seen 
(cf. Appendix[7I) to be an average of expressions of the following type. Let S = {ei, . . . , ex} be 
a family of edges of H, this is to say subsets of {1, . . . , t}. We define the average of / on 5 in 
the following way: 



The operator E5 is naturally associated with a binary matrix A whose columns are characteristic 
vectors of Cj. We will also denote this operator by E^. 



Example 5.1: Let A 

test of ig. I 



1 1 
1 1 



. Then Ea(/) = ^x,yf{x)f{y)f{x + y) is the basic linearity 



For A = [1], the average of / over A is, of course, simply the expectation E/. The notion of 
generalized average is naturally extended to real or complex valued functions on {0, 1}". 

The analysis of the probability of acceptance of a hypergraph test entails studying gen- 
eralized averages of functions. In particular, we would like to upper bound such averages by 
expressions which are convenient to deal with. 

With this in mind, we define a useful family of binary matrices. 

Definition 5.2: For an integer A; > 1 let A^, be a (/c + 1) x 2^ matrix of the following form: the 
last row of A^ is an all-1 vector. Removing this last row gives a, k x 2 matrix whose columns 
are all binary vectors of length k (in an arbitrary order) . | 

Observe that EAkif) is precisely " "^"^ 



■ik\J J ^" picv^iocij WJ \\Uk- 

We prove several properties of generalized averages in Appendix [7| leading to the following 
main claim. This is essentially a restatement of theorem 12.21 

Theorem 5.3: Assume that all the columns in A are distinct and have at most k ones. Then 
for any Boolean function f 

Ea(/) < (Ea,(/))^ = WfWu, 



5.2 Gowers norms and pseudorandomness 

In the previous subsection we have seen how Gowers uniformity norms \\-\\uk appear naturally in 
the analysis of linearity tests. These norms were originally defined in ^J and were instrumental 
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in the new proof of Szemeredi's theorem on arithmetic progressions given in that paper. We 
refer to ^Tj, ^^ for a more detailed discussion. Here let us briefly mention that, intuitively, 
the k-th uniformity norm of a function is high if this function has a non-negligible correlation 
with a polynomial of degree k — 1. This is to say, this function has a non-trivial combinatorial 
structure. On the other hand, if a function has small uniformity norms, we would like to deduce 
that it is 'pseudorandom', in an appropriate sense. In particular, it is shown in iTlj that if a 
characteristic function of a subset of the integers has small uniformity norms, then the number 
of arithmetic progressions it contains is similar to that contained by a random subset of the 
same size. 

This notion of pseudorandomness naturally generalizes (and strengthens) the standard no- 
tion of a boolean function (or a set) being pseudorandom if its non-zero Fourier coefficients are 
small. In fact, the maximal size of a Fourier coefficient is controlled by the second uniformity 
norm. Since uniformity norms are monotone increasing, if a function has a small A;-th unifor- 
mity norm, k > 2, it is also pseudorandom in the usual sense. This is, of course, intuitively 
clear, since a function far from degree-(A; — 1) polynomials is, in particular, far from linear 
polynomials. 

In our context, a function / with a small A;-th uniformity norm, is pseudorandom in the fol- 
lowing sense. Consider a linearity test associated with a hypergraph H = {V, E) with maximal 
edge-size k. Theorem 12.21 implies that the \E\ copies of the basic linearity test that H runs on 
/ behave essentially independently. 

5.3 Quadratic Fourier Analysis 

We would now like to give a more specific meaning to the intuitive notion that a function with 
a high k-th uniformity norm should have a non-trivial combinatorial structure, presumably a 
non-trivial correlation with a polynomial of degree k — 1. 

Unfortunately, at this point, we can only do it for fc = 3. By Theorem 12.31 if \\f\\ua > e 
then there exists a quadratic polynomial g such that the distance between / and g is at most 
1/2 — e', for e' depending on e only. 

We conjecture a similar statement to be true for any fixed k. A step in this direction was 
made in '2i-, where a function with a high A;-th uniformity norm is shown to have variables 
with large influence. 

Similar results for k = 3, but replacing Zg by finite Abelian groups of cardinality indivisible 
by 6, have been independently proved by Green and Tao '14j. The dependence of e' on e in both 
cases is super-exponential. In flM this dependence is improved in the following way: it is shown, 
specializing here to Z5 for clarity, that one can find a subspace V of Z5 of a fixed co-dimension 
and a family of quadratic polynomials gy indexed by cosets of V, such that typically / is 1/2 — e" 
close to gy on y + V, where the dependence of e" on e is polynomial. This extension turns out 
to be useful in obtaining good bounds on arithmetic progressions of length 4 in subsets of Z5 
(and in general finite Abelian groups). In this context. Green and Tao introduce the notion of 
quadratic Fourier analysis ^21- According to this point of view, the subject of classical Fourier 
analysis is to represent a function as a combination of several linear functions (elements of the 
Fourier basis) it has non-negligible correlation with (i.e., corresponding Fourier coefficients are 
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large), and of a 'random' remainder (a function with small Fourier coefficients). In quadratic 
Fourier analysis, a function is approximated by a combination of quadratic polynomials. This 
approach has proven to be quite effective in additive number theory J13l [T^j and in ergodic 
theory |16|l28j. in situations in which classical Fourier analysis fails. 



Theorems 12.31 and 13.31 can be viewed as an application of quadratic Fourier analysis on TH^ 
to boolean functions. We suggest that this tool might have other applications as well. (Among 
other things, it should be possible to extend Theorem 12.31 to obtain results similar to those of 
|14j . but we haven't checked the details.) 
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6 Appendix A: A Proof of Theorem 12.31 

We start with a short discussion on discrete directional derivatives of functions on {0, 1}"". 

Let / : {0, 1}" -^ {—1, 1} be a boolean function, and y be a vector in {0, 1}". We define 
the "derivative of / in direction y" by 

fy{x) = f{x)f{x + y) 

The transformation f h^ fy is a linear operator. This operator decreases the degree of the 
polynomial representation of a function: if / is representable by an n-variate polynomial of 
degree d, then fy is representable by a polynomial of degree d — 1. 

We define recursively fyj^^y^ = ifyi)y ■ It is easy to see that /yi,j/2 — /j/2,s/i' ^^^ ™ ^^'^^ 
fyi,y2{x) = fy2,yi(.x) = f{x)f {x + yi) f {x + 7/2) f (x + yi + ?/2)- Similarly, the k-th order direc- 
tional derivative fy^^,..^y,. of / with respect to j/i, ...,yfc at a point x is given by 



fyi,-,yk{x) = n -^ p + IZ^/i 



[k] 

If a function / is a polynomial of degree d, then the derivative fyi^...^y,. is a polynomial of degree 
d — k, for all choices of linearly independent yi,---,yk Q E]- In particular, the {k + l)-th 
derivative of a degree-A; polynomial vanishes (in our terms, it is identically 1). 

Observe that, in light of the definition above, the claim of Theorem 12.31 can be interpreted 
as follows: if a random third derivative of a function vanishes with probability greater than 1/2 
then the function is somewhat close to a quadratic polynomial. 

The proof of the theorem involves several technical lemmas. The main tools are Fourier 
analysis on Z2 (jl7j) and additive number theory. 

In the following the Greek letters e, e', 6, 6' will denote absolute positive constants (indepen- 
dent of n) whose value may fluctuate. 

Lemma 6.1: For a function f : {0, 1}" -^ M 

\fj^=EyY,fy\a) 



Proof: We start with proving 



k = Ef'(-) 



Indeed, 

WfWh, = Ex,y,zf(.x)f{x + y)f{x + z)f{x + y + z) = E, {f {x) ■ Ej,,,/(x + y)f{x + z)f{x + y + z)) 
Introducing new variables u = x + y,v = x + z, this equals to 

Ex (fix) ■ Eu,^f{u)f{w)f{x + u + w))=]K^ {fix) ■ EM *f){x + u)) = 
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[a 



^.f{x){f*f*f){x) = {f,f*f*f) = [f,P)=Y,fH^ 

a 

Now, 

II/IIl^3 = ^x,y,z,wf{x)f{x+y)f{x+z)f{x+w)f{x+y+z)f{x+y+w)f{x+z+w)f{x+y+z+w) = 

Ky^x,z,wfyix)fyix + z)fy{x + w)fy{x + Z + w) = Ey'^ fy (a) 

a 

I 

Corollary 6.2: Assuming f is boolean and \\f\\ua — ^i there exist constants 6,6' and a choice 
function (j) : {0, 1}" -^ {0, 1}" such that 

Pry {\fy{<t^{y))\ >5)> 6'. 
Proof: The derivatives fy are also boolean functions, and therefore 

^y X] fy («) ^ ^y ™^x h (") " X^ fy («) = ^y ™^x h (") 

a a 

I 

Let A be an n X n matrix over F2. li g = (— 1)(^^'^/+" is a quadratic polynomial^, then 
fy[x) = [-i){{A+A^)y,x)+a ^ (_i)(i?j/,x>+a_ jjere 5 = ^ + .4* is a symmetric matrix with a 
zero diagonal. So for a quadratic polynomial the choice function (j){y) = By is linear, and of a 
special form. We will therefore look for similar properties of the choice function in our case. 

It is sufficient to find a choice function that coincides with an appropriate linear function 
with positive probability. This will follow from an observation that if derivatives of two boolean 
functions are close on average then so are the functions themselves (up to a linear shift). 

Lemma 6.3: For boolean functions f,g: 



Proof: 



^x {{fx,gx)f = E^Ey^,y2fx{yi)gx{yi)fx{y2)gx{y2) = 
^xEy^,y2f{yi)f{yi + x)g{yi)g{yi + x)f{y2)f{y2 + x)g{y2)g{y2 + x) 

E. (E,(/5)(y)(/5)(y + x)f = E, {{fg) * {fg)? (^) = E Tg\c^)- 



■'Observe that, working with the field of 2 elements, we can incorporate the linear term of a quadratic form 
in the exponent into the quadratic term, by modifying the diagonal of the matrix appropriately. 
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Corollary 6.4: Let B be a symmetric matrix with a zero diagonal such that 

^yfy\By) > e. 
Then there exists a quadratic polynomial g such that 

\\f-g\\<\-e'. 

Proof: Let ^ be a matrix such that A + A^ = B. Consider a quadratic polynomial h{x) 
(_1)(^,A^). We have 



2, 



^A{fx,K))=^J. {Bx)>e'. 

By lemma 1^31 there is a vector a such that \fh{a)\ > e'. This implies that there is a choice of 
a € {0, 1} such that for a quadratic polynomial g{x) = (_i)(^'.^^)+(^.">+'i holds 

ll/-5ll<^-6'. 



We start by finding a weakly linear choice function. This is made possible by the following 
observation. 

Lemma 6.5: 

a,t3 a 

Proof: We start with an observation that for a boolean function / and for any x, s in {0, l}" 
holds {fx * fx){s) = {fs * fs){x). Indeed, expanding 

(/. * L){s) = ^yfs{y)fs{x + y)= ^yf{y)f{y + s)f{x + y)f{x + y + s). 

Define a function F : {0, 1}" -^ {-1, 1} by taking F{y) = f{y)f{y + (x + s)). Then the last 
expression is {F * F){x) = (F * F){s). Expanding (/^ * fs)ix) we get the same result. 

Now, 

a,f3 

^x,y^'^u,u'fx{u)fx{u')Wa{u + u')Eyyfy{v)fy{v')Wfs{v + v')E^^^,f.j;+y(z)fx+y{z')Wa+l3{z + z') = 
a,f3 

Ex^yEsEu,v,zfx{u)fx{u + S)fy{v)fy{v + s) f x+y{z) f x+y{z + S) = 

EsE^^y ifx * fx) (S) ify * fy) (s) [f^+y * /^+y) (s) = 

E,E,,j, (/, * fs) (x) (/. * /.) (y) (/. * /.) {x + y) = 
Es^fs\a). 
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Corollary 6.6: 

ll/llc/3 > e =^ ^.,y Y. h\a)fy\(5)Uy\a + /?) > e'. 

Proof: By lemmas l6. II and 16.51 and Holder's inequality. | 

Define a product distribution on functions (f) : {0, 1}" — > {0, 1}" by taking Pr{(f){x) = a) = 

fx (a). The choices for distinct values of x are independent. Let 5 = ^. Define a random 
variable L on this probability space, by taking 



L{4>) = Prx,y 



{^{x) + ^{y) = <P{x + y); fl^H^)) > 6; ... jQy^Hx + y)) > s} 



Lemma 6.7: 

Proof: 

E<^L(0)=E,,j^Pr^{0(x) + 0(y)=0(x + y); fj'{qy{x))>6; ... f^y\cP{x + y)) > 6^ 

^x,y Yl fl\a)fy\p)jQy\a + p)> 

a,l3 : fJia)>S; ... f^y^ {a+l3)>& 

^x,y Y L\a)fy\p)I^y\a + /3) " 3<5 > e' - 35 > |-. 

a,/3 



Take (f) for which L{(f)) > ^. This is the choice function we choose. Our goal is to find 
an appropriate linear transformation B : {0, 1}" -^ {0, 1}" such that and B coincide on a 

~ 2 

positive fraction of the domain in which f^ {(t>{x)) > 6. 

We will do this in several steps. In the first step we will find an affine transformation 

* 2 

X -^ Dx + z such that ¥^xfx {Dx + z) > e' . Then we will gradually modify this transformation 

^ 2 

to obtain a symmetric linear transformation B with a zero diagonal such that 'E^fx (Bx) > e'. 
By lemma inm this will conclude the proof of the theorem. 

The first step is the hardest. We will follow an approach of Gowers from his proof of 
Szemeredi's theorem for arithmetic progressions of length four Xl^- Note that a structural 
theorem of Preiman for sets with small sumsets in Z is replaced by a theorem of Ruzsa for such 
sets in Z2 . 

Let A = < a; : fx (ipi^)) ^ <5 f • Then, by the choice of cfi, the cardinality ?7i of ^4 is a positive 
fraction of 2", and there are J7 (m^) triples (x, y,x + y) in A^ satisfying (p{x) + (/)(y) = (j){x + y). 
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Now define a subset A of {0, 1}^" as 

A = {(x,(/>(x)) : X e A} . 
A is the graph of cp on A. We have |^| = |^| = m, and there are (m^) triples (a, b,a + b) in 

Theorem 6.8: (Gowers ^11]) For any subset A of an abelian group satisfying above, there a 
subset A' of A containing a constant fraction of the elements, such that 

\A' + A'\ <c-\A'\, 

for an absolute constant c. 

Theorem 6.9: (Ruzsa 12^ ) Let G be an abelian group. Assume that the order of the elements 
in G is bounded, and let r be the maximal order of an element. Let A' '^ G with the property 
above. Then 

1-^ I ^ / -c" 

1 — -irr — r >c-r "" , 
I < -^ > I 

for some absolute constants c',c". 

We assume the projection of < „4' > on the first n coordinates to be of full rank. (Otherwise 
add a finite number of vectors to < ^' > to ensure this.) Therefore, there are n vectors vi...Vn 
in {0,1}'^ such that the vectors Ui = (ej,fj) are in < A' >. Let U =< ui...Un >■ Clearly 
U = {{x,Dx) : X £ {0,1}"}, where the matrix D is defined by Dci = Vi, i = l...n. U is a 
subspace of < ^' > of a finite co-dimension. Therefore there exists a vector c G {0, 1}^" such 
that a constant fraction of the vectors in A sit in [/ + c. This is the same as to say that there 
is a vector z G {0, 1}" such that for Q (2") points x G A holds <p{x) = Dx + z. Alternatively: 



EJI\Dx + z)>€'. 



We can choose z = 0. 



Lemma 6.10: Define a function F : {0, 1}" -^R by F{z) = J2y fy^{F>y + z). Then 

Fix) = u\d'x). 

Proof: 

F{x) = ¥.,F{z)w^{z) = E,w^{z) J2 fy\Dy + z) = 

y 

E:,W:i:{z)'^Euyfy{u)fy{u')wDy+z{u + u) = Ey^ufy{u) fy{u + x)WDy{x) = 

y 
^yify * fy)ix)wDy{x) = Ey{f^ * fx){y)wDtx{y) = fx (-D*x). 
I 

Since the transform of F is nonnegative, F attains its maximum in 0. Therefore 

eJx\Dx) > eJx\Dx + z)> e'. 
We want to replace D hy a symmetric matrix. The following fact is useful. 

17 



Lemma 6.11: 

for any x and y with {x, y) = 1. 

Proof: 

fy{x) = EJy{z)w^{z) = EJ{y)f{z)f{y + z)w^{z) = EJ{y)f{z)f{y + z)w^{y + z) 

Wx{y)^zf{y)f{z)f{y + z)w^{z) = Wx{y)fy{x) = -fy{x) 
I 

Therefore, for g{x) = (— 1)^^''^^) holds 

E,g{x)fJ{Dx) = EJI\Dx) > e'. 
On the other hand 



^x9{x)fl {Dx) = Y,g{z)Ejx (D'x + z). 



^ 2 

Since the numbers A^ = Exfx (D^x + z) are nonnegative and sum to one, we deduce by Jensen's 
inequahty that 

Y,9\z)Exfx'{D'x + z)>{e')'. 

z 

However 

^g\z)Exfl^{D'x + z) =Exig* g)ix)fx\Dx) =Ex5dx,d*x- g{x)fx\Dx). 

z 

Let S be a matrix defined in the following way: Set U = {x : Dx = D^x^. Then [/ is a 
subspace of Zg. Let S be defined on U by taking S{x) = D{x) on U. Then for any x,y £ U 
holds {x, Sy) = {Sx, y). Now the definition of S could be extended to the whole space keeping 
this property. Therefore S is a symmetric matrix such that 

^xfx^Sx) > e. 

It remains to deal with the diagonal of S. Let / be the vector on the diagonal of S. Since 

{x,Sx) = {x,f), we have 

^.T.f'^'(Sx)>e. 

x±f 

Define a matrix B by taking Bx = Sx if x _L /, and extending B appropriately to the whole 
space. Namely for w JL f take Bw = z, so that {z, x) = {w, Bx) = {w, Sx) for all x _L /, and 
{w,z) = 0. Then B is symmetric with zero diagonal, and 

E j/(5x) > e'. 

This concludes the proof of theorem 12.1^1 but for the dependence of e' on e. Tracing this 
dependence through the proof, it is possible to see that we can choose e' > il (exp {— (-n-)}) 
for an absolute constant C. I 
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7 Appendix B: A Proof of Theorem 12.21 

We will prove Theorem 15.31 

This will imply Theorem 12.21 as follows: Let / : {0,1}" — > { — 1,1} be a boolean function. 
Let H = {V, E) be a hypergraph with maximal edge-size d. For a subset 5 C £) of edges of H, 
let (j[S) = X^eesd^l "'" ■*■)■ "^^^ probability that the linearity test associated with H accepts / 
is given by 



nn/(^^)-/ 



x^^i 



2l 

The summand corresponding to S* = is 1. By Theorem 15.31 all the other summands are at 
most ||/||t/d ill absolute value. Theorem 12.21 follows. 

We start with some facts on generalized averages (^. Recall that each such average is 
naturally associated with a t x T binary matrix A. The first observation is that some matrices 
(families of sets) define the same average operator. 

Lemma 7.1: Multiplying A on the left by a non-singular t x t matrix B does not change the 
value of the average, namely for any function f holds E^(/) = ¥.BA{f), 

Proof: Eyi and "^ba are the same up to order of summation. | 

Corollary 7.2: We may (and will) assume that the rows of A are linearly independent, since 
if rank [A) = r < t we can choose a non-singular t x t matrix B, so that in BA the last t — r 
rows are zeroes, and consequently can he removed without changing the value o/E^. 

Consider an equivalence relation on t x T binary matrices of full rank, defined by left 
multiplication by a non-singular txt matrix. It is easy to see that two matrices are equivalent 
iff their rows span the same i-dimensional space over Z2' (or they represent the same rank-t 
binary matroid on {l...r} |2nj ) . 



The following definition and lemmas are natural (and well-known) in the setting of matroids 
(see mi). 

Definition 7.3: A hyperplane of ^ is a maximal subset of {1, . . . , T} such that the columns of 
A indexed by this subset are not of full rank. | 

Lemma 7.4: A vector v is a minimal non-zero vector in the row space of A iff the complement 
of its support is a hyperplane of A. 

Lemma 7.5: The row space of A is spanned by its minimal non-zero vectors. 

The key part of the proof of theorem 15.31 is the following technical proposition: 

Proposition 7.6: Let A be a full rank t x T binary matrix. Let v be a minimal vector in the 
row-space of A. Let A' be a {t -\- 1) x 2\v\ matrix obtained from A by the following procedure: 
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1. Delete all the columns not in the support of v, obtaining a t x 1^1 matrix B. 

2. Set 



A' 



Then for all boolean functions f , 



B 
1...1 



B 

0...0 



Ea(/) < ^/^^A^)■ 



Example 7.7: Let 



and take v = {1,3}. Then 



A 



A' 



1 1 
1 1 

1111 
10 10 
110 



Proof: Let H be the complement of f , and by lemma 17.41 a hyperplane of A. Let Hq be a 
maximal independent subset (a basis) of H, of size t — 1. We assume that H = {1, . . . ,\H\} 
and Hq = {l,...,t — 1}. Multiply A on the left by a non-singular t x t matrix B so that the 
first i — 1 columns of A are the first t — 1 unit vectors. Since Hq is a basis of H, the columns 
of BA indexed by H will have a zero in their last coordinate, while the columns m. v = W^ will 
have one (since H \s a, hyperplane). Namely 



BA 



r 10. 


.0 




01. 


.0 


N 


00. 


.1 




. 00. 


.0 


0...01...1 . 



We have, for any /: 

¥.A{f)=^BA{f)=^y„...,yJ{yi)--- 



fiu. 



t-lj 



n/(n*)- 



i=t 



We upper bound the right hand side in the following way, applying the Cauchy-Schwarz in- 
equality: 

]Em(/) = Ey^,...,yt^^F{yi,...,yt-i) ■ G {yi, . . . ,yt-i) < 



E. 



t-i 



iF'^{yi,---,yt- 



E. 



yi,---,yt 



-,G^yi 



.yt-ij 



where F{yi,..., yt_i) = l\.^^ f{yj), and G (yi, . . . , yt-i) = Ey^ H-^^ f [UjeA, Vj 
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Observe that 

E, 



yi,--;yt- 



_,F^{yi,...,yt-i)=^. 



yi,--;yt^ 



.j\yi)-...-fiyt^i)=El = l. 



N 


N 


... 00 ... 


0...01...1 


_ 0...01...1 


... 00 ... . 



Therefore KmH) < vEG^. It is easily seen that EG^ presents an average of / on a (t + 1) x 
2(r — t + 1) matrix A^^\ where 



^(1) 



We now transform the matrix A^^' to A' in three steps. Let u and v be the last two rows of the 
matrix. First, replace u with u + v obtaining a new matrix A^'^' . 

The second step uses booleanity of /. Note that for any matrix A and any boolean function 

/, deleting a pair of identical columns of A does not change the average of / on A. We delete 

all the columns of A^"^' in which the last two coordinates are zero, obtaining a (t + 1) x 2\v\ 

r B^^ 
matrix A^"^' . The third step is to multiply A^^> by a (t + 1) x (t + 1) matrix Si = 

obtaining A' . | 

Now we are ready to prove Theorem 15.31 We will prove the theorem for A; = 3. The proof 
for larger values of k is similar. 

Let ^4 be a matrix with at most 3 ones in each column. Assume the rows of A to be 
independent. Let u be the first row. The first step is to replace u with a minimal non-zero 
vector V with a smaller support. If u is already a minimal vector let v = u. Otherwise there 
is a vector w in a row-space of A whose support is strictly smaller than that of u. One of the 
vectors w or u + w is not spanned by the rest of the rows of A and we set ui to be this vector. 
If ui is minimal set v = ui. Otherwise continue with ui instead of u. Clearly this process stops 
after a finite number of steps and does not change the row space of A. 

Now we apply the transformation of proposition 17.61 to the new matrix A, choosing v as the 
appropriate minimal vector. Consider the submatrix B of the new matrix A'. The first row of 
B, and therefore of A' as well, is a 1- vector. Moving it to be the last row, we obtain 



A' 



B' 



B' 
0...0 



The matrix B' has at most 2 ones in each column. Note that all the columns in B' are distinct, 
since so were the columns of A. There are two cases to distinguish. 

B' has only one column. Then, removing dependent rows, we get to a 2 x 2 matrix Ai = 
1 
1 1 
fWui ^ ll/llt/s- In the last inequality we use monotonicity of uniformity norms. 



By Proposition 17.61 for any boolean function / holds Ea(/) < (Eai(/)) 
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B' has more than one column. If there are dependencies between rows of A' we remove 
them, keeping only a spanning set of rows, starting with the last two. In particular A' has no 
all-1 rows except the last. 

We now repeat the procedure starting from A' . Consider the first row u of A' . u = {u' \ u') 
is a symmetric vector. If it is minimal set v = u. If not, there is a minimal vector w of smaller 
support such that replacing u with w does not affect the row space of A' . The vector w is in this 
row space, therefore it is either symmetric or antisymmetric. However if it is antisymmetric then 
u has to be an all-1 row, which we have excluded. Therefore we can replace it by a symmetric 
minimal vector v. Now apply the proposition with A' and v, and obtain a new matrix (after 
simplification) 



A" 



B" 
1...1 



B" 
0...0 



B" 
0...0 



B" 
0...0 
0...0 



The matrix B" has at most one 1 in each column. All the columns in B' are distinct. 

Once again, there are two cases. If there is only one column in B" , after simplification we 
get a 3 X 4 matrix 

"10 10 
110 
1111 



such that for any boolean function / holds 



Ea(/) <{^M{f)) 



1/4 



U2 



< 



Us- 



If B" has more than one column we iterate once again. It is not hard to see that the new 

matrix B'" will necessarily have only one column. After simplifying, we will get to a 4 x 8 

matrix 

"10 10 10 10 

110 110 

11110 

11111111 



A. 



such that for any boolean function / holds 
proved. I 



IEa(/) < (Ea3(/)) 



1/8 



U3- 



The theorem is 



8 Appendix C: Other proofs 

8.1 Proof of Proposition 13.21 



First we estimate 



t/g within additive precision of 0{6). This can be done by choosing at 
random (^) quadruples of vectors x,yi,y2,ys G {0,1}" and averaging fyi,y2,y3{x) over the 
choices. Let us call this average i^. It is easy to see that for a sufficiently large number of 



sampled quadruples. 



C/3 



< 5/2 with high probability. 
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Assuming this is true, there are two possibihties. First, v > 5. In this case, H/Hc/g > 5/2. 
Theorem 12. 'A\ now imphes that the second option of the proposition holds. 

The second option is v < 6. In this case ||/||c/3 < 35/2. Now we follow an argument from 
J14j . Let g' be a quadratic polynomial. Recalling the interpretation of || • ||^ as the average of the 
third order derivative of a function, it is easy to see that ||/g||;73 = \\f\\u3 < 35/2. Observe that 
the first uniformity "norm" of a function is the square of its expectation. By the monotonicity 
of uniformity norms. 



{f,9) = T9m<\\f\\ll^'<\\fC^<0[5'/') 



Since both / and g are boolean functions, this implies that the distance between / and g is at 
least 1/2 — O ((^^'^), and the first option of the proposition holds. 

8.2 Proof of Theorem ISTal 

The completeness of the test follows from the fact that it checks whether third order derivatives 
of the function vanish. 

The fact that the soundness of the test is l/2l^l is an immediate consequence of Theorems l5.3l 
with k = 3 together with Theorem 12.31 Indeed, let a boolean function / be 1/2 — e far from 
quadratic polynomials. Similarly to the proof of Theorem 12.21 the acceptance probability of 
the test on a function / is upper bounded by 1/21-^1 + ll/llt/3. By Theorem 12.31 this is at most 
1/21-^1 + e', with e' ^ with e. 

8.3 Proof of Theorem BTTI 

Combining Theorems 16.81 and l6.91 similar Iv to the proof of Theorem 12. 31 we obtain the following 
claim. 

Theorem 8.1: Let p be a prime number, and let e > 0. Let G be a p-group of order r and let 
H be a power ofLp. Let (p : G ^ H such that 

Prx,yeG {(l>{x) + (piy) = H^ + y)) > e. 

Then there exists a homomorphism ip : G ^ H and an element h G H such that 

Pr^^G {(Pix) = i^{x) + h)>c- r-"' ■ e^", 

where c,c',c" are absolute constants (independent of the groups G,H). 

The following lemma concludes the proof of Theorem 14.11 

Lemma 8.2: Let G be a p-group of order r, and let H be a power ofLp. Let (j) : G ^ H be 
such that there exists a homomorphism ip : G ^ H and an element h £ H such that 

Pr^^G {(p{x) = i^{x) + h)>5. 
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Then there exists a homomorphism ip' : G ^ H such that 



Pr.^G {Hx) = i^'ix)) >^-S, 



for an absolute constant c. 



Proof: Let E = {x £ G : (p{x) = ip{x) + h}. Let G = 01^1 ^p*i- There exists an absolute 
constant c, a coordinate 1 < i < m, and a generating element g G Z k- such that for at least 
^-fraction of the elements of E holds Xi = g. Call this set E' . Let ei...em. be the standard basis 
of G. Consider a homomorphism ip' : G ^ H defined as follows: ip'icj) = ip{ej) for j ^ i and 
ip'{g ■ Ci) = ip{ei) + h. Then ^' agrees with cj) on E' . | 
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